Reinforcement-Based Web Crawler

نویسندگان

  • Santiago Manuel A. Perez
  • Rachel Edita O. Roxas
چکیده

This paper presents a focused web crawler system which automatically creates a minority language corpora. The system uses a database of relevant and irrelevant documents testing the relevance of retrieved web documents. The system requires a starting web document to indicate where the search would begin.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Surface Deep Web Content

We propose a novel deep web crawling framework based on reinforcement learning. The crawler is regarded as an agent and deep web database as the environment. The agent perceives its current state and submits a selected action (query) to the environment according to Q-value. Based on the framework we develop an adaptive crawling method. Experimental results show that it outperforms the state of ...

متن کامل

Focused Crawling Using Temporal Difference-Learning

This paper deals with the problem of constructing an intelligent Focused Crawler, i.e. a system that is able to retrieve documents of a specific topic from the Web. The crawler must contain a component which assigns visiting priorities to the links, by estimating the probability of leading to a relevant page in the future. Reinforcement Learning was chosen as a method that fits this task nicely...

متن کامل

YAFC: Yet Another Focused Crawler

As the Web continues to grow rapidly, focused topic-specific Web crawlers will gain popularity over traditional general-purpose search engines for locating, indexing and keeping up to date information on the Web. This paper presents YAFC (Yet Another Focused Crawler), a neurodynamic programming approach to focused crawling. YAFC combines TD(λ) reinforcement learning with a neural network to lea...

متن کامل

Reinforcement Learning with Classifier Selection for Focused Crawling

Focused crawlers are programs that wander in the Web, using its graph structure, and gather pages that belong to a specific topic. The most critical task in Focused Crawling is the scoring of the URLs as it designates the path that the crawler will follow, and thus its effectiveness. In this paper we propose a novel scheme for assigning scores to the URLs, based on the Reinforcement Learning (R...

متن کامل

A Framework for Deep Web Crawler Using Genetic Algorithm

The Web has become one of the largest and most readily accessible repositories of human knowledge. The traditional search engines index only surface Web whose pages are easily found. The focus has now been moved to invisible Web or hidden Web, which consists of a large warehouse of useful data such as images, sounds, presentations and many other types of media. To use such data, there is a need...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004